Digital spatial profiling of intraductal papillary mucinous neoplasms: Toward a molecular framework for risk stratification

The histopathologic heterogeneity of intraductal papillary mucinous neoplasms (IPMN) complicates the prediction of pancreatic ductal adenocarcinoma (PDAC) risk. Intratumoral regions of pancreaticobiliary (PB), intestinal (INT), and gastric foveolar (GF) epithelium may occur with either low-grade dysplasia (LGD) or high-grade dysplasia (HGD). We used digital spatial RNA profiling of dysplastic epithelium (83 regions) from surgically resected IPMN tissues (12 patients) to differentiate subtypes and predict genes associated with malignancy. The expression patterns of PB and GF lesions diverged from INT, suggesting that PB and GF arise from a common lineage. Transcriptional dysregulation within PB lesions mirrored that of PDAC, whereas INT and GF foci did not. Tumor necrosis factor/nuclear factor κB (TNF-NFκB) and cell cycle (cycling S and cycling G2-M) programs occurred with relative prominence in PB and INT subtypes, respectively. Together, this study delineates markers of high-risk IPMN and insights into malignant progression.


INTRODUCTION
Pancreatic ductal adenocarcinoma (PDAC) remains a leading cause of cancer death, predominantly because of the lack of early detection strategies that enable identification of patients at a potentially curable stage (1). Intraductal papillary mucinous neoplasms (IPMN) are cystic lesions of the pancreas that represent a radiographically detectable precursor to pancreatic cancer (2). While the most IPMN do not progress to malignancy, the ability to accurately differentiate those lesions at low risk for progression [lowgrade dysplasia (LGD)] from those at high risk [high-grade dysplasia (HGD) and early cancer] remains elusive (3). It is widely accepted that operative treatment of HGD is appropriate, whereas radiographic surveillance is appropriate for those with LGD. Consensus guidelines designed to predict risk of HGD through clinical, radiographic, laboratory, endoscopic, and cytologic parameters have an overall accuracy of approximately 60% (4)(5)(6). Hence, there has been intense interest in the development of more accurate biomarkers for high-risk IPMN.
Pathologic characterization of IPMN demonstrates multiple dysplastic histologic epithelial subtypes that often coexist within individual specimens. Epithelial subtypes observed histologically include pancreaticobiliary (PB), intestinal (INT), gastric foveolar (GF), and an oncocytic variant. The PB and INT subtypes comprise the overwhelming majority of IPMN with propensity for invasive cancer, and GF represents an indolent subtype associated with favorable prognosis. Studies comparing patient outcomes by subtype suggest that PB histology is more likely to harbor or progress to malignancy, and patients with invasive lesions from the PB subtype experience similar outcomes as patients with conventional PDAC (7)(8)(9). Genomic interrogation of IPMN has implicated DNA alterations in KRAS, GNAS, and RNF43 as the most prevalent events within neoplasms, but these mutations may coexist and do not reliably associate with histologic subtype or grade of dysplasia (10)(11)(12). Mutations such as TP53, CDKN2A, SMAD4, and others may be predictors of HGD or invasive carcinoma when present, but their prevalence is relatively low, and the sensitivity is limited. Numerous studies have proposed possible mRNA, microRNA, and protein biomarkers for high-risk IPMN, but to date, none has been incorporated into clinical use because of prognostic inaccuracies (13)(14)(15)(16)(17)(18).
Intralesional heterogeneity further complicates the discovery of markers for high-risk IPMN. Studies of bulk tissue or cyst fluid convolute the mosaic of epithelial subtypes and grades of dysplasia that occur within individual patients, thereby obfuscating any underlying signal that may be present in disease foci. Microdissection can isolate and compare regions of dysplastic epithelium, but these approaches can be technically challenging and disrupt tissue quality (19,20). Single-cell RNA sequencing (RNA-seq) of IPMN can characterize unique cell populations within bulk tissues, but do so at the expense of their spatial relationships, and are thus unable to distinguish differences between histologic subtype and pathologic grade (21).
The emergence of multiplex digital spatial profiling addresses the above challenges in deconvoluting heterogeneous tissues to delineate disease phenotypes (22,23). This technology offers precise comparison of gene expression among user-defined disease regions without the need for cumbersome microdissection. We recently used this technology to explore the composition of immune cells within the IPMN tumor microenvironment (24). Here, we report spatial RNA profiling of ductal epithelium across subtypes to determine markers of dysplasia and identify biological processes that associate with malignant progression. underwent pancreatectomy for IPMN between 2017 and 2021 (Fig. 1A). Clinicopathologic details of the cohort are summarized in Table 1. A pancreatic pathologist (C.S.) characterized the specimens and prepared tissue blocks for profiling according to the following criteria: (i) predominantly INT (n = 6) or PB (n = 6) histology, (ii) presence of at least HGD within the specimen, and (iii) adequate areas of LGD and HGD within a single tissue block to facilitate a controlled comparison between grades of dysplasia (Fig. 1B). Because of the heterogeneity of the tumors, the specimens also contained numerous regions of GF epithelia with uniformly LGD. Invasive carcinoma occurred in four of six PB specimens and none of the INT specimens. To compensate for this potential bias, we prepared slides from areas of the specimen that entirely lacked invasive carcinoma.

Targeted spatial transcriptome profiling
We used the Nanostring GeoMx Cancer Transcriptome Atlas (CTA) platform to profile a total of 98 epithelial areas of interest (AOIs) (50 HGD and 48 LGD) from all slides (6 to 10 per slide;  S1C). Epithelial regions with HGD tended to contain greater cellular density than LGD AOIs, leading to increased sequencing counts and signal-to-noise area under the curve (snAUC) (Materials and Methods and fig. S1D). Of the epithelial subtypes, INT harbored the greatest cellular density, followed by PB and finally GF AOIs (fig. S1E). Sequencing yield and snAUC reflect these trends. The normalization strategy to account for this potential bias is discussed below.
A median of 48.3% of probes was expressed above background in each individual AOI (IQR 42.7 to 55.2%; table S2). Gene filtering retained 1288 of 1828 (70.4%) genes expressed at detectable levels above background in at least 20% of AOIs ( fig. S2A and table S3). The raw count density distributions of the AOIs indicate that the filtered genes were of low abundance ( fig. S2B). Background subtraction followed by quantile normalization was applied to account for differences in sequencing yield, enabling biological comparison of gene expression across AOIs (tables S4 and S5). The normalized count density distributions across individual AOIs were grossly similar ( fig. S3A), and there was no observable bias by either pathologic grade or epithelial histology ( fig. S3, B and C).

Divergence of INT from PB and GF subtypes
An unbiased survey of expression patterns was performed using principal components analysis. The AOIs formed two groups that corresponded to histologic subtype, with INT AOIs forming one group and PB-GF (non-INT type) AOIs forming the second group ( Fig. 2A). The separation of AOIs by grade of dysplasia along the first two principal components varied among specimens (Fig. 2B). Of the INT patients, the AOIs from slides 4 and 12 grouped tightly together, suggesting homogeny. Two patients (slides 1 and 3) that contained a mixture of GF (LGD) and INT (HGD) epithelium were notable because the GF and INT regions clustered with their respective histologic subtypes, rather than by patient. Notable separation between HGD and LGD AOIs occurred in INT slides 2 and 9. Invasive carcinoma, although excluded from slides prepared for digital spatial profiling, occurred in four specimens with PB histology (slides 5, 6, 8, and 11) (Fig. 2C). The HGD AOIs from these four patients grouped closely together, suggesting the existence of a high-risk phenotype. Of the two PB specimens lacking invasive carcinoma, one grouped with the other PB AOIs (slide 7) and other grouped more closely with GF AOIs (slide 10).  S3). The union of these sets comprised 127 histologic subtype-specific genes (Fig. 3A). Hierarchical clustering of the AOIs produced distinct clusters corresponding to INT, GF, and PB histology. Clustering at the gene level further revealed three clusters denoted as C1-INT, C2-PB, and C3-PB-GF (non-INT). The AOIs from slides 2 and 10 did not cluster with their respective epithelial subtypes. Rather, slide 2 clustered with PB AOIs despite having a mix of INT and GF epithelium, and the PB-HGD AOIs from slide 10 clustered with GF (LGD). When describing histologic subtype marker genes across the three subtypes, the presence of known IPMN marker genes served as external validation for the DE results. Mucin genes, including MUC1, MUC2, MUC4, MUC6, and MUC5AC, have been proposed to discriminate between subtypes (14,15), and the NanoString CTA probe set contained two of these mucin genes, MUC1 and MUC4. Consistent with prior reports, MUC1 was overexpressed in PB-GF (non-INT) AOIs, and MUC4 was overexpressed in INT AOIs (Fig. 3, B and C) (14,15,(25)(26)(27). At the RNA level, neither MUC1 nor MUC4 were found to be significantly altered in PB relative to GF AOIs. Notably, several genes outperformed MUC1 and MUC4 as IPMN subtype marker genes, including clusterin (CLU), retinol binding protein 4 (RBP4), and keratin 17 (KRT17), which showed marked overexpression in GF, INT, and PB AOIs, respectively (Fig. 3C).
Given that IPMN with HGD represents PDAC in situ, we postulated that gene expression changes essential to the progression to HGD should be retained in PDAC (8). To investigate this rationale, we compared DE genes in HGD versus LGD IPMN with PDAC gene sets curated from external sources: (i) Clinical Proteomic Tumor Analysis Consortium (CPTAC) RNA-seq and proteomics analysis of pancreatic cancer versus normal adjacent tissues, (ii) the Human Protein Atlas (HPA) analysis of The Cancer Genome Atlas (TCGA) RNA-seq data predicting prognostic genes in pancreatic cancer, (iii) Mao et al. analysis of bulk RNA-seq data from PDAC versus adjacent benign tissue, and (iv) Grützmann et al. meta-analysis of pancreatic cancer microarray experiments (see Materials and Methods) (28)(29)(30)(31). The analyses from CPTAC and HPA (TCGA) were considered more robust as these were generated by large consortia using standardized protocols. Of the 30 genes overexpressed in HGD versus LGD IPMN, 6 were overexpressed in PDAC (CPTAC RNA-seq) and associated with unfavorable prognosis (HPA): FOSL1, AREG, CCL20, SERPINB5, ITGA2, and LAMC2 ( fig. S4). Additional genes found to be overexpressed in both IPMN and PDAC but not associated with prognosis included PLAUR, IL1R2, KRT17, EPHA2, LIF, CD55, and CARD11.
Gene Set Enrichment Analysis (GSEA) demonstrated significant enrichment [adjusted P value (P adj ) < 0.05] against all curated gene sets ( fig. S5 and table S7). Statistical significance largely depended on gene set size, with down-regulated genes exhibiting weaker enrichment than up-regulated genes. The most significant enrichment   (32). Genes were ranked by log fold change in HGD versus LGD IPMN. This resulted in 7 of 50 significantly enriched gene sets (P adj < 0.01). The gene set with the highest enrichment score represented genes regulated by nuclear factor κB (NFκB) in response to tumor necrosis factor (TNF) signaling (TNF-NFκB). We interpreted this result as supportive of the known link between inflammatory signaling and progression in IPMN (33)(34)(35)(36). An additional three of seven enriched gene sets pertained to cell proliferation. Leading edge analysis of the genes enriched in two or more genes sets yielded two unique clusters (Fig. 4E). The first cluster included MKI67 and other genes associated with cell proliferation and division, and the second cluster involved genes associated with inflammatory signaling, hypoxia, and epithelial-tomesenchymal transition.  (Fig. 5A, fig. S6, and table S6). Carcinoembryonic antigen-related cellular adhesion molecule 6 (CEACAM6), which has been reported as a possible biomarker in IPMN and cholangiocarcinoma, was highly up-regulated in PB but not INT IPMN (Fig. 5B) (21,37). In addition, laminin subunit beta-3 (LAMB3) and POU class 5 homeobox 1 (POU5F1) were specific to PB-HGD but not INT-HGD AOIs. Early growth response-1 (EGR1) was more specific to INT, although met significance thresholds in the grouped analysis of HGD versus LGD dysplasia.

Association of IPMN subtypes with PDAC
Subgroup DE analysis was performed on pairwise combinations of histopathologic groups (six comparisons), and the resulting ranked gene lists were tested against the curated PDAC gene sets (table S7). The PB-HGD versus PB-LGD analysis had the greatest absolute normalized enrichment score (NES) in 8 of 10 gene sets and outperformed analyses involving INT-HGD regions for every gene set (Fig. 5C). Of particular significance was the stark difference in NES for the HPA Prognosis gene sets. All comparisons involving PB-HGD AOIs were highly enriched, whereas none of the INT-HGD comparisons showed significant enrichment. This suggests that PB-HGD IPMN most closely resembles invasive carcinoma and may represent a direct precursor to malignancy in the dysplastic progression of IPMN.

Up-regulation of inflammatory signaling and cell proliferation during dysplastic progression
Exploratory GSEA against MSigDB Hallmark gene sets was replicated for each of the six pairs of histopathologic groups ( Fig. 5D and table S8). Cell proliferation programs (S and G 2 -M phases) were upregulated in INT-LGD relative to PB-LGD (GF) AOIs. This was concordant with the greater cell density observed within INT regions on microscopic examination and reflected in the initial QC analysis (figs. S1E and S8). By contrast, the TNF-NFκB transcriptional program was up-regulated in PB-LGD relative to INT-LGD AOIs. Both PB-HGD and INT-HGD regions overexpressed both TNF-NFκB and proliferation pathways relative to their LGD    counterparts. Together, these results suggest that INT may arise as a primarily proliferative lesion before acquiring the TNF-NFκB program and progressing to HGD. By contrast, PB-LGD (GF) may arise in the setting of inflammatory signaling and acquire the capacity to proliferate during progression to HGD. No significant hallmark pathways distinguished PB-HGD from INT-HGD, suggesting that transcriptional programs not encompassed by MSigDB Hallmark gene sets must account for the marked differences between PB-HGD and INT-HGD.

Unsupervised network analysis to delineate gene clusters associated with progression to carcinoma
To determine whether gene expression patterns could infer biological pathway activity and cancer risk without reliance on pathological annotation, we performed coexpression network analysis (Materials and Methods). This resulted in a network containing 300 genes connected by 791 edges (Fig. 6A and tables S9 and S10). Unsupervised clustering partitioned the network into 29 clusters of coexpressed genes. We discarded clusters with <10 genes (23 of 29) for which enrichment testing would be underpowered, leaving 6 communities for further investigation.
Hypergeometric enrichment testing was used to associate clusters with gene sets from the MSigDB Hallmark database and Gene Ontology Biological Processes (GO:BP; table S11) (38). After consolidating redundant biological processes, we selected the most statistically significant gene sets to represent each cluster (Fig. 6A). We then projected the standardized mean gene expression across PB-HGD and INT-HGD AOIs onto the network using color gradient overlays (Fig. 6B). To relate network clusters with PDAC, we performed enrichment testing against PDAC gene sets and visualized the CPTAC RNA-seq DE genes as a third network overlay (Fig. 6C).
Clusters C4 and C7 warranted consideration as contributors to malignant progression based on significant enrichment for CPTAC RNA-seq PDAC genes (C4 22 of 46 genes, P adj = 3.7 × 10 −6 ; C7 25 of 37 genes, P adj = 8.3 × 10 −12 ). Cluster C4 was significantly enriched for TNF-NFκB signaling (17 of 46 genes, P adj = 4.4 × 10 −6 ) and contained 15 of 30 genes overexpressed in HGD versus LGD IPMN. Genes within C4 tended to be expressed at higher levels in PB relative to INT IPMN. Accordingly, C4 harbored 15 genes overexpressed in PB-HGD versus PB-LGD AOIs, compared to only 6 genes overexpressed in INT-HGD versus INT-LGD AOIs. Cluster C7 was significantly enriched for cell cycle genes (28 of 37 genes, P adj = 2.1 × 10 −18 ). Genes within C7 tended to be expressed at higher levels in INT relative to PB IPMN and contained more genes up-regulated in INT versus GF AOIs (22 of 37 genes) than PB versus GF AOIs (6 of 37 genes). In contrast to C4, C7 contained none of the DE genes in HGD versus LGD IPMN. Together, these results implicate C4 (TNF-NFκB) as a prominent transcriptional program in PB IPMN and C7 (cell cycle) as a prominent program in INT IPMN.

DISCUSSION
Management of patients with IPMN presents an opportunity to prevent pancreatic cancer; however, current management strategies are limited in their ability to provide accurate recommendations because our ability to predict timing of progression is limited. Cytologic or pathologic confirmation of disease subtype and/or grade of dysplasia is difficult without operative resection, and resection is associated with substantial morbidity and even mortality. Asymptomatic patients who present without high-risk radiographic stigmata of carcinoma represent a clinical conundrum for which no accurate diagnostic modalities currently exist. For this patient population, the concept of a prognostic molecular assay holds great promise, but despite considerable investigation, no such assay has gained clinical traction.
Available retrospective evidence suggests an association between histopathology and clinical outcomes in patients with IPMN. Specifically, PB histologic subtype portends poor prognosis relative to INT, GF appears to represent an indolent entity, and the presence of HGD forecasts the development of invasive cancer (7,8,(39)(40)(41). Numerous efforts have failed to translate these retrospective pathologic observations into predictive biomarkers for technical and disease-related reasons. The major caveat with "bulk" studies of IPMN tissue rests in the assignment of a single grade and subtype annotation to a specimen, effectively homogenizing the disease morphologies and degrees of dysplasia that occur within the affected pancreas. In an unsupervised clustering analysis of single-cell RNA-seq data from patients with IPMN, Bernard et al. (21) found subpopulations of cells from tissues designated LGD within clusters of HGD and carcinoma cells, corroborating histopathologic evidence that IPMN harbor a mixture of cells along a dysplastic spectrum. Ultimately, our understanding of the neoplastic progression of IPMN hinges upon the ability to characterize these tissues in a more granular fashion.
Until now, attempts to isolate neoplastic ductal epithelium required technically challenging tissue handling. Jury et al. (20) combined laser capture microdissection with microarray technology to study gene expression changes associated with IPMN progression. The investigators reported markers of pancreatic islet cells, including the hormones insulin, glucagon, and somatostatin, among the most significantly differentially expressed genes in their dataset, raising doubt that the microdissection procedure precisely isolated neoplastic epithelium. Sato et al. (19) used selective microdissection paired with microarrays to compare gene expression between normal ductal epithelia, noninvasive IPMN, and invasive IPMN. The authors reported several genes also found by our study, including serpin family B member 5 (SERPINB5), CD55 molecule (Cromer Blood Group) (CD55), and integrin subunit alpha 2 (ITGA2). However, the study exclusively examined regions of carcinoma in situ (HGD) and lacked classification by epithelial subtype, limiting its clinical translational potential. As an alternative to tissue isolation, IPMN cyst fluid-rich in DNA, RNA, and protein-can be obtained with minimally invasive fine needle aspiration. However, the acquired material constitutes a convolution of secreted molecules and sloughed off debris from normal and neoplastic pancreatic tissue. Therefore, biomarker identification from either cyst fluid or bulk tissue present similar challenges.
In the current study, we leveraged digital spatial RNA profiling using a targeted gene panel to characterize precise regions of IPMN histopathology across tissue slides and produce robust gene expression patterns by epithelial subtype and grade. Unsupervised dimensionality reduction analysis demonstrated distinct groups of INT and non-INT (PB-GF) AOIs. INT and GF AOIs clustered apart even when derived from the same patient and in the same tissue section. By contrast, the intimate association of PB and GF AOIs suggests that GF and PB share a common neoplastic cell lineage distinct from INT lesions. Given that GF regions are almost universally deemed as low grade, we suggest that GF can essentially be considered a precursor to PB epithelium. Now, mucin genes serve as a generally accepted differential marker of epithelial subtype. The two mucin genes (MUC1 and MUC4) included in the CTA panel showed significant association with PB-GF and INT subtypes, respectively. However, despite the intended use of MUC1 as a specific marker of PB IPMN, it did not discriminate between PB and GF AOIs in this dataset. Rather, CLU, RBP4, KRT17, and other candidate marker genes nominated by this analysis displayed superior potential to classify epithelial subtypes.
Comparison of gene expression by grade of dysplasia identified gene expression alterations that mirrored those reported in PDAC. Evaluation of gene expression enrichment against independent datasets also served as validation of the spatial profiling platform and our data analysis approach. Clustering the AOIs on the set of 38 genes dysregulated in HGD versus LGD partitioned the specimens into high-risk and low-risk groups. Eight of the 12 slides, including the 4 slides from patients with invasive carcinoma, harbored one or more high-risk AOIs. We envision that a gene expression classifier derived from this gene signature could serve as a risk stratification tool for patients with IPMN.
This study also corroborates evidence for individual candidate biomarkers of high-risk IPMN, including CD55, laminin subunit gamma 2 (LAMC2), amphiregulin (AREG), and others. As noted above, CD55 was previously reported as a biomarker for IPMN by gene expression microarray experiments and was found to be associated with disease progression in a proteomic profiling study of IPMN cyst fluid (19,42). An enzyme-linked immunosorbent assay of LAMC2 in plasma of patients with PDAC augmented the accuracy of the widely used PDAC biomarker CA- 19-9 (43). LAMC2 has also been detected in pancreatic duct fluid exosomes in patients with IPMN and PDAC (44). AREG was found to be predictive of high-risk IPMN in a serum biomarker panel based on antibody microarray technology and has also been detected in pancreatic cyst fluid (45,46). Validation studies using these and other candidate genes in pancreatic cyst fluid are warranted.
To explore the biological underpinnings of IPMN, we performed a combination of supervised and unsupervised analyses. The supervised analysis leveraged annotation of AOIs by a pancreatic pathologist, DE testing, and GSEA to find biological associations. The unsupervised analysis used gene expression correlation information to construct a coexpression network and a community detection algorithm to partition the network into clusters of highly correlated genes. Clusters were then assessed for biological significance through hypergeometric enrichment testing. Ultimately, the two analysis approaches led to similar conclusions. Two transcriptional programs appear to be driving neoplastic progression in IPMN: inflammatory signaling (TNF-NFκB) and cell proliferation (S and G 2 -M phases). Activation of cell proliferation was more prominent in INT relative to PB lesions, whereas inflammatory signaling was more pronounced in PB than INT. Malignant potential, assessed by enrichment of gene alterations shared with PDAC datasets, was predominantly associated with PB epithelium. The unsupervised network analysis corroborated these findings, yielding a 46gene cluster associated with the constellation of PB epithelium, inflammatory signaling, and PDAC genes, without a priori knowledge of pathologic annotations. Measurement of the transcriptional activity of this gene signature could transcend the histopathologic designations that currently serve as surrogate measures of malignancy risk.
This study has several important limitations. First, the CTA probe panel used in this study measures only~10% of human protein coding genes. A targeted panel reduces the power of GSEA and hypergeometric testing alike to detect significant biological pathway enrichment. Genes not measured by the CTA panel may outperform the marker genes nominated by this study or provide evidence of other transcriptional programs with relevance in IPMN. Incorporation of a more comprehensive panel will be important in future studies.
Second, our modest cohort size of 12 specimens may not embody the breadth of disease biology across the common epithelial subtypes (PB, GF, and INT) and omits rare entities such as oncocytic IPMN. In addition, the co-occurrence of invasive carcinoma in most of the PB cohort and none of the INT cohort confounds our comparison of the two subtypes. Our attempt to mitigate this confounder by requiring all tissue slides to be devoid of invasive carcinoma may or may not be compensatory. Certainly, the available clinical outcome data support our findings: Patients with invasive carcinoma derived from INT fare far better than patients with PB-derived carcinomas (7). Expanded profiling that includes specimens with and without invasive carcinoma from both subtypes will be needed to fully resolve this issue.
A third and related limitation of the cohort design was the requirement that every specimen in the study has regions of HGD. It is conceivable that regions of LGD from specimens lacking HGD/invasive carcinoma could be different from areas of LGD found in conjunction with HGD/invasive carcinoma. The idea that high-risk IPMN could be detected before the development of HGD would certainly alter our clinical approach to the disease. Expanded spatial profiling that includes regions of normal ductal epithelium and invasive carcinoma could address this intriguing possibility.
In summary, our findings offer several refinements to our understanding of IPMN. First, GF epithelium likely represents a precursor to PB rather than a common progenitor to either PB or INT. Second, the activation of inflammatory signaling associates with high-risk IPMN and occurs predominantly in PB lesions. This finding lends credence to ongoing clinical trials of anti-inflammatory therapies in the prevention of IPMN progression. Last, the incorporation of subtype-specific and high-risk marker genes nominated by this study may facilitate the development of an accurate risk stratification assay in IPMN.

Patient recruitment
Archival biospecimens from patients who had undergone pancreatic resection for IPMN at Duke University Hospital System between 2017 and 2021 were considered for spatial RNA profiling. The Institutional Review Board approved the use of deidentified patient specimens for retrospective molecular profiling. Informed consent was not required because of the retrospective nature of the study with minimal risk and deidentification of the specimens. Clinicopathological data were collected by study coordinators and securely stored in the REDCap database. Archived FFPE specimens were procured and reviewed by a board-certified pathologist specializing in pancreatic pathology to confirm diagnosis (C.S.). Specimen blocks were cut into 5-μm-thick serial sections. One section was stained with hematoxylin and eosin and imaged using a Nikon TE2000-E microscope for pathology review. Sections containing regions of both LGD and HGD were selected for spatial transcriptomics and mounted on a positively charged slide for this application.
Spatial RNA profiling Digital spatial RNA profiling was conducted using the NanoString GeoMx Digital Spatial Profiler (DSP) (22). Our pathologist (C.S.) selected regions of interest (ROIs) annotated by histologic subtype (PB, GF, and INT) and grade of dysplasia (LGD or HGD). We selected ROIs that encompassed the spectrum of subtype-grade combinations present on each slide and included multiple biological replicates of each combination. The GeoMx DSP imposes a maximum ROI diameter of 700 μm. Individual ROIs were drawn to maximize the number of epithelial cells contained while adhering to the size constraint. Segmental profiling of individual cell populations within each ROI was performed by staining the tissues with fluorescently conjugated antibodies: CD45 for immune cells, smooth muscle actin (SMA) for stromal fibroblasts, and anti-pan-cytokeratin (PanCK) for epithelial cells. DSP tissue slides were incubated with the fluorescently conjugated antibodies to CD45, PanCK, and SMA along with a cocktail of photocleavable oligonucleotide probes from the GeoMx CTA kit. Segmentation thus produced multiple AOIs from each ROI. Libraries were prepared according to the NanoString GeoMx Library Preparation Manual and pooled to equimolar concentration. RNA was sequenced under standard conditions on an Illumina NovaSeq 6000 to a depth of 30 read pairs/μm 2 .

Quality control and normalization
Ultraviolet-cleaved barcode sequencing reads were processed by the GeoMx DSP Analysis Server. Processing steps included read trimming, alignment, and deduplication. A tabulated matrix of probe counts for each AOI was exported from the DSP server and subsequently analyzed using R version 4.2.0.
QC metrics for each AOI were computed, including the receiver operating characteristic curve and the associated AUC metric of gene probes (N = 8584) versus negative probes (N = 75) (47). This was used to denote the snAUC. A limit of detection (LOD) for each AOI was set to the 90th percentile count of negative probes. A probe count below this LOD was considered undetectable. The following QC criteria were used to filter AOIs: (i) >100,000 aligned deduplicated total counts and (ii) snAUC > 0.65. AOIs that did not meet these QC criteria were removed.
The CTA kit features multiple probes per gene (8584 independent probes targeting 1829 unique gene identifiers). Probes with undetectable expression in >80% of AOIs were removed, and the geometric mean of representative probe counts was computed to produce a single-expression value for each gene. Raw gene counts were normalized in two steps to account for differences in background levels and sequencing output across AOIs. The geometric mean of negative probe counts, a measure of background level, was subtracted from each AOI. Background-subtracted counts were scaled by total library size and subjected to quantile normalization (48).

Data analysis
Analysis of filtered, normalized gene expression data was performed in the R language with Bioconductor (49). Dimensionality reduction analysis was performed with principal components analysis. DE analysis was conducted using the limma package (50). Criteria for calling DE genes included absolute log 2 fold change > 1.0 and P adj < 0.05. Heatmap plots were generated using the pheatmap R package with row and column clustering using the "ward.D2" method (51). GSEA was conducted using the Bioconductor package fgsea (52)(53)(54). Genes were ranked by their log 2 fold change in each DE analysis (e.g., HGD versus LGD). Gene sets from MSigDB version 7.5.1. were obtained from the msgidbr package (54,55).

Curation of external datasets
External RNA-seq analysis results from the National Cancer Institute's CPTAC proteogenomic characterization of pancreatic adenocarcinoma were obtained from table S3 (28). Genes were merged by official gene symbol. Genes with an absolute log 2 fold change ≥ 1.0 and P adj < 0.05 were considered differentially expressed and included in gene sets. RNA-seq analysis from Mao et al. (29) comparing PDAC versus matched normal pancreas was curated into gene sets (log 2 fold change ≥ 1.0, P adj < 0.01). Prognostic pathology information from the HPA resource was downloaded from the HPA website (www.proteinatlas.org/download/pathology.tsv.zip) (30). Genes with prognostic significance were merged by official gene symbol and curated into gene sets.

Coexpression network analysis
Gene-gene coexpression analysis was performed by computing the Spearman correlation matrix of all genes, as well as a null distribution of correlation coefficients from 10,000 random permutations of the gene expression values. Multiple testing correction was applied using the qvalue package in Bioconductor (56). Coexpressed gene pairs were designated as having absolute correlation coefficient ≥ 0.7 and q < 0.01. Community detection was performed using Leiden clustering with a resolution parameter of 0.5 (57). Visualizations of the resulting correlation network were produced by Gephi using the ForceAtlas 2 layout algorithm (58). Edge weights were set to the correlation coefficient raised to the fourth power. Cluster enrichment analysis was performed using the "enricher" function from the clusterProfiler package (59).

Supplementary Materials
This PDF file includes: Fig S1 to S8 Legend for data file S1 Other Supplementary Material for this manuscript includes the following: Data file S1